Humans demonstrate a variety of interesting behavioral characteristics when performing tasks, such as selecting between seemingly equivalent optimal actions, performing recovery actions when deviating from the optimal trajectory, or moderating actions in response to sensed risks. However, imitation learning, which attempts to teach robots to perform these same tasks from observations of human demonstrations, often fails to capture such behavior. Specifically, commonly used learning algorithms embody inherent contradictions between the learning assumptions (e.g., single optimal action) and actual human behavior (e.g., multiple optimal actions), thereby limiting robot generalizability, applicability, and demonstration feasibility. To address this, this paper proposes designing imitation learning algorithms with a focus on utilizing human behavioral characteristics, thereby embodying principles for capturing and exploiting actual demonstrator behavioral characteristics. This paper presents the first imitation learning framework, Bayesian Disturbance Injection (BDI), that typifies human behavioral characteristics by incorporating model flexibility, robustification, and risk sensitivity. Bayesian inference is used to learn flexible non-parametric multi-action policies, while simultaneously robustifying policies by injecting risk-sensitive disturbances to induce human recovery action and ensuring demonstration feasibility. Our method is evaluated through risk-sensitive simulations and real-robot experiments (e.g., table-sweep task, shaft-reach task and shaft-insertion task) using the UR5e 6-DOF robotic arm, to demonstrate the improved characterisation of behavior. Results show significant improvement in task performance, through improved flexibility, robustness as well as demonstration feasibility.
translated by 谷歌翻译
Skill-based reinforcement learning (RL) has emerged as a promising strategy to leverage prior knowledge for accelerated robot learning. Skills are typically extracted from expert demonstrations and are embedded into a latent space from which they can be sampled as actions by a high-level RL agent. However, this skill space is expansive, and not all skills are relevant for a given robot state, making exploration difficult. Furthermore, the downstream RL agent is limited to learning structurally similar tasks to those used to construct the skill space. We firstly propose accelerating exploration in the skill space using state-conditioned generative models to directly bias the high-level agent towards only sampling skills relevant to a given state based on prior experience. Next, we propose a low-level residual policy for fine-grained skill adaptation enabling downstream RL agents to adapt to unseen task variations. Finally, we validate our approach across four challenging manipulation tasks that differ from those used to build the skill space, demonstrating our ability to learn across task variations while significantly accelerating exploration, outperforming prior works. Code and videos are available on our project website: https://krishanrana.github.io/reskill.
translated by 谷歌翻译
在本文中,我们呈现VDTTS,一个视觉驱动的文本到语音模型。通过配音而激励,VDTTS利用视频帧作为伴随文本的附加输入,并生成与视频信号匹配的语音。我们展示了这允许VDTTS,与普通的TTS模型不同,产生不仅具有自然暂停和间距等韵律变化的语音,而且还与输入视频同步。实验,我们显示我们的模型产生良好的同步输出,接近地面真理的视频语音同步质量,在几个具有挑战性的基准中,包括来自VoxceleB2的“野外”内容。我们鼓励读者查看演示视频,演示视频语音同步,对扬声器ID交换和韵律的鲁棒性。
translated by 谷歌翻译
基于深度学习的模型,例如经常性神经网络(RNNS),已经应用于各种序列学习任务,取得了巨大的成功。在此之后,这些模型越来越多地替换对象跟踪应用程序的经典方法,用于运动预测。一方面,这些模型可以通过所需的更少建模捕获复杂的对象动态,但另一方面,它们取决于参数调谐的大量训练数据。为此,我们介绍了一种用于在图像空间中产生无人机(UAV)的合成轨迹数据的方法。由于无人机,或者相反的四轮压力机是动态系统,它们不能遵循任意轨迹。通过UAV轨迹实现对应于高阶运动的最小变化的平滑度标准的先决条件,可以利用规划侵略性的四轮机会飞行的方法来通过一系列3D航点产生最佳轨迹。通过将这些机动轨迹投影,该轨迹适合于控制二次调节器,实现图像空间,实现了多功能轨迹数据集。为了证明合成轨迹数据的适用性,我们表明,基于RNN的预测模型,在生成的数据上训练,可以在真实的UAV跟踪数据集上优于经典的参考模型。评估是在公开的反UAV数据集完成的。
translated by 谷歌翻译
在诸如对象跟踪的应用中,时间序列数据不可避免地携带缺失的观察。在基于深度学习的模型的成功之后,对于各种序列学习任务,这些模型越来越替换对象跟踪应用中的经典方法,以推断对象的运动状态。虽然传统的跟踪方法可以处理缺失的观察,但默认情况下,大多数深度同行都不适合这一点。迄今为止,本文介绍了一种基于变压器的方法,用于在可变输入长度轨迹数据中处理缺失的观察。通过连续增加所需推理任务的复杂性,间接地形成模型。从再现无噪声轨迹开始,该模型然后学会从嘈杂的输入中推断出来的轨迹。通过提供缺失的令牌,二进制编码的缺失事件,该模型将学习进入缺少数据,并且Infers在其余输入上调整完整的轨迹。在连续缺失事件序列的情况下,该模型则用作纯预测模型。该方法的能力在反映原型对象跟踪方案的综合数据和实际数据上进行了证明。
translated by 谷歌翻译
Scenarios requiring humans to choose from multiple seemingly optimal actions are commonplace, however standard imitation learning often fails to capture this behavior. Instead, an over-reliance on replicating expert actions induces inflexible and unstable policies, leading to poor generalizability in an application. To address the problem, this paper presents the first imitation learning framework that incorporates Bayesian variational inference for learning flexible non-parametric multi-action policies, while simultaneously robustifying the policies against sources of error, by introducing and optimizing disturbances to create a richer demonstration dataset. This combinatorial approach forces the policy to adapt to challenging situations, enabling stable multi-action policies to be learned efficiently. The effectiveness of our proposed method is evaluated through simulations and real-robot experiments for a table-sweep task using the UR3 6-DOF robotic arm. Results show that, through improved flexibility and robustness, the learning performance and control safety are better than comparison methods.
translated by 谷歌翻译
在诸如跟踪之类的任务中,时间序列数据不可避免地携带缺失的观察。虽然传统的跟踪方法可以处理缺失的观测,但经常性的神经网络(RNNS)旨在在每一步中接收输入数据。此外,RNN的当前解决方案,例如省略缺失的数据或数据归档,不足以解释所产生的不确定性。迄今为止,本文介绍了一种基于RNN的方法,其提供了用于运动状态估计的完整时间过滤周期。卡尔曼滤波器启发方法,可以处理缺少的观察和异常值。为了提供完整的时间过滤周期,扩展了基本RNN以考虑其精度以考虑更新当前状态而采取观察和相关的信念。生成参数化分布以捕获预测状态的RNN预测模型与RNN更新模型组合,这依赖于预测模型输出和当前观察。通过提供具有屏蔽信息的模型,二进制编码的缺失事件,模型可以克服标准技术的限制来处理缺失的输入值。模型能力在反映了原型行人跟踪方案的合成数据上证明了模型能力。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
We present a dynamic path planning algorithm to navigate an amphibious rotor craft through a concave time-invariant obstacle field while attempting to minimize energy usage. We create a nonlinear quaternion state model that represents the rotor craft dynamics above and below the water. The 6 degree of freedom dynamics used within a layered architecture to generate motion paths for the vehicle to follow and the required control inputs. The rotor craft has a 3 dimensional map of its surroundings that is updated via limited range onboard sensor readings within the current medium (air or water). Path planning is done via PRM and D* Lite.
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译